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Abstract 

On the occasion of Laurens de Haan's 70th birthday, we discuss two aspects of the sta- 
tistical inference on the extreme value behavior of time series with a particular emphasis 
on his important contributions. First, the performance of a direct marginal tail analysis 
is compared with that of a model-based approach using an analysis of residuals. Second, 
the importance of the extremal index as a measure of the serial extremal dependence is 
discussed by the example of solutions of a stochastic recurrence equation. 
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1 Introduction 

Since the publication of his Ph.D.-thesis, Laurens de Haan has been one of the main 
driving forces behind the impressive development of extreme value statistics in the last 
four decades. While he is best known for his seminal contributions to extreme value theory 
for i.i.d. samples of univariate and multivariate observations and, in recent years, for i.i.d. 
copies of continuous stochastic processes, he has also strongly influenced the extreme value 
theory (and practice) for serially dependent data in two ways: first by direct contributions, 
and second indirectly by promoting general principles. In the present paper, both aspects 
of the impact of Laurens de Haan's work on the development of extreme value statistics 
for time series are discussed. 

Throughout his work, Laurens de Haan has always aimed at the greatest (reasonable) 
generality of the models under consideration. For example, while in many articles on 
univariate extreme value statistics it was assumed that exact generalized Pareto random 
variables (r.v.s), respectively, generalized extreme value r.v.s were observed, typically he 
only assumed that the underlying distribution belongs to the domain of attraction of 
some extreme value distribution. Under this much more general condition, he analyzed 
the consequences of this deviation from the ideal situation. The second order condition, 
de Haan and Stadtmiiller (1996) introduced and analyzed for that purpose, is now the 
generally accepted standard condition in this field (cf. (12.41) for a simplified version). (It is 
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worth mentioning that essentiaUy the same condition has been independently suggested 
by Pereira (1994).) Similarly, also in de Haan's work on multivariate extreme value 
statistics it is not assumed that the observations are drawn from an exact extreme value 
distribution, but only that the underlying distribution belongs to the domain of attraction 
of such a distribution. Moreover, he always preferred weak smoothness conditions on the 
exponent measure pertaining to this multivariate extreme value distribution to restrictive 
parametric submodels of the natural infinite-dimensional extreme value model. With the 
improvement of the resulting nonparametric methods, in the last couple of years this 
approach has become more widely accepted as a reliable tool, that is more robust than 
parametric approaches. 

The extreme value estimators, that were suggested and analyzed for univariate i.i.d. sam- 
ples by Laurens de Haan and many others, can also be used for the marginal tail analysis 
of stationary time series, but often their performance deteriorates because of the serial 
dependence between the observations. Therefore, in contrast to the aforementioned gen- 
eral trend towards weak model assumptions, in the literature on extreme value statistics 
for time series (and particularly linear time series) often an approach is favored in which 
a parametric serial dependence structure is assumed and estimators are considered which 
are based on a tail analysis of the (nearly independent) residuals after the paramet- 
ric model has been fitted. In the main Section 2, we will reassess some of the results 
which seemingly show the underperformance of a direct extreme vahie analysis, that only 
requires weak nonparametric assumptions on the dependence structure, relative to the 
model-based approach. 

Often one is interested not only in the marginal tail behavior but also in the extremal 
dependence structure. The literature on the dependence analysis is strongly dominated 
by the problem of estimating the extremal index, that describes the influence of the serial 
dependence on the asymptotic behavior of maxima of consecutive observations. It is 
somewhat surprising that, while in the last two decades statistical methods which are 
based on cxcccdances (or order statistics) instead of block maxima have become much 
more popular, this shift of focus is not reflected in the statistical theory of the extremal 
dependence structure. In Section 3, we will argue that the statistical inference of the 
extremal dependence structure should be put on a broader basis and exemplify this claim 
by the asymptotic behavior of naturally arising statistics that were analyzed by Laurens 
de Haan and co-authors in a specific time series model. 

Obviously, the extreme value statistics of time series is a field of research much too broad 
and diverse to be reviewed in a short article. For that reason, we decided to focus on the 
two above topics, knowing that this selection is largely a matter of taste. Important sub- 
fields which we will not discuss at all are, for instance, the extreme value inference under 
additional structural assumptions (e.g. for Markov chains) and the analysis of nonstation- 
ary or multivariate time series, among many other topics. We will also not discuss Laurens 
de Haan's contributions to the extreme value theory of continuous time processes, since 
he usually assumes that i.i.d. copies of the whole process are observed. Consequently, this 
theory is a natural extension of the theory for multivariate observations rather than the 
theory for time series and will thus be discussed in Michael Falk's contribution to this 
volume. 

Throughout this article, we will assume that X^, t & Z, is a stationary time series with 
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marginal distribution function (d.f.) Fx, that belongs to the domain of attraction of some 
extreme value distribution. 

2 Marginal tail analysis: In models we trust? 

In this section we assume that only the tail behavior of the stationary marginal d.f. Fx 
is to be analyzed. To this end, estimators of tail parameters based on exceedances over 
high thresholds can be used which were developed for i.i.d. observations, but the serial 
dependence must be taken into account when the accuracy of these estimators is assessed 
(e.g. to construct confidence intervals). 

Roughly speaking, one can distinguish three different approaches: 

(i) One tries to identify independent clusters of exceedances and constructs a new data 
set by taking one observation (usually the cluster maximum) from each cluster. This 
way one obtains an i.i.d. sample whose tail behavior can be analyzed using standard 
techniques from classical extreme value statistics for i.i.d. data. 

(ii) In a nonparametric approach, one may also apply the classical tail estimators (orig- 
inally proposed for i.i.d. samples) directly to all exceedances observed in the time 
series. However, if one wants to construct confidence intervals, then one needs re- 
sults on their asymptotic behavior that hold true under mild assumptions on the 
serial dependence structural. 

(iii) Finally, in a semiparametric fashion, one can fit a parametric model of the serial 
dependence to the data and then one can try to infer the tail behavior of the time 
series from a suitable analysis of the residuals. This approach seems best suited 
for heavy-tailed linear time series for which the relationship between the tail of 
the stationary distribution and the tail of the distribution of the innovations is 
particularly simple. 

The declustering approach (i) is most appropriate if the time scries consists of clearly 
separable, short clusters of extreme events, that preferably have a "physical" interpre- 
tation. Nice examples are data sets of wave heights and other quantities describing sea 
conditions that were analyzed by Laurens de Haan and co-authors in several publications. 
For example, starting with wave heights, wave periods and still water levels that were 
observed every 3 hours at some point near the Dutch coast, de Haan and de Ronde (1998) 
obtained nearly i.i.d. data by only considering the maximum of each coordinate in each 
storm. See Dekkers and de Haan (1989), and de Haan (1990) for further examples of that 
type. 

Unfortunately, in many applications either it is difficult to identify independent clusters of 
extremes, or the clusters are large so that it would be a great waste of information to use 
but one observation in each cluster. For example, time series of returns of some financial 
investment often exhibit long periods of high volatility during which several dependent 
exceedances occur. Moreover, declustering schemes often depend on certain subjective 
choices, and usually the influence of these choices on the accuracy of the tail analysis is 
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difficult to assess. (A ffist trial to overcome these problem was made by Ledford and 
Tawn (2003).) 

For these reasons, in the sequel we will focus on the nonparametric approach (ii) and the 
model-based semiparametric approach (iii). In particular, we will compare the accuracy 
and the robustness of resulting estimators of extreme quantiles in the case of heavy-tailed 
linear time series. 



2.1 Direct extreme value analysis 

Among all tail estimators, the asymptotic behavior of the Hill estimator 
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under serial dependence has been studied most thoroughly in literature; here Xj,n denotes 
the jth smallest order statistic of Xi, . . . , One of the first references is an unpublished 
manuscript by Rootzcn, Lcadbcttcr and dc Haan (1990), in which the asymptotic nor- 
mality of the Hill estimator is established under quite weak conditions, including strong 
mixing of the time series. At about the same time, Hsing (1991) independently proved 
the asymptotic normality of the Hill estimator under comparable, but different structural 
assumptions on the serial dependence. Since then, the limit distribution of (variants of) 
the Hill estimator under serial dependence has been examined in several papers; see, e.g., 
Resnick and Starica (1997) and Novak (2002). 

Of course, in most applications, the extreme value index is not the primary object of 
interest, but for instance exceedance probabilities or extreme quantiles are to be estimated. 
Consequently, Rootzen, Leadbetter and de Haan (1990) also examined the asymptotic 
behavior of extreme quantile estimators. Moreover, more general statistics of the type 

n 
1=1 

(with suitable functions 0„) were considered, which are nowadays known as tail array 
sums. The asymptotic theory of tail array sums was further developed by Leadbetter and 
Rootzen (1993) and Leadbetter (1995). In a final version, this part of the manuscript was 
published in the article Rootzen, Leadbetter and de Haan (1998). 

The general results about the asymptotic normality of tail array sums proved a powerful 
tool. In particular, Rootzen (1995), who established the weak convergence of the empirical 
process 

1 " 

en{x) = . {kXi>a„x+Un} ' Fx{(JnX + «„)) 

(with Fx = 1 — Fx) towards a Gaussian process under /3-mixing (absolute regularity) 
of the time series, used this result to verify the convergence of the finite dimensional 
marginal distributions. (In the improved version Rootzen (2006), a similar result is also 
established under the weaker assumption that the time series is strongly mixing.) Using 
this convergence. Drees (2000) proved a weighted approximation of the pertaining tail 
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quantile process, from which one can easily conclude the asymptotic normality of a general 
class of estimators of the extreme value index or of estimators of extreme quantiles; cf. 
Drees (2000,2002,2003). 



2.2 Model-based tail estimators 

Here we focus on linear time series models, because for this class the semiparametric 
model-based approach seems particularly promising. More precisely, we assume that the 
time series allows a representation as a moving average of infinite order: 

oo 

Xt^Y, V'i^t-i, t e (2-1) 

j=-oo 

Moreover, the i.i.d. innovations Zf are assumed to have balanced heavy tails, i.e. their 
survival function Fz satisfies 

Fz{x) = x-'/^L{x), hm = p (2.2) 



a;— >oo 



for some 7 > 0, p G (0, 1] and some slowly varying function L. Mikosch and Samorodnitsky 
(2000) proved that 

fS / \ -.00 

= - E H^'l{^.>o} + (1 -P)IV^//^1{^.<0}) (2.3) 



if < Ej°l-oo^i < 00 for 7 < 1/2, and < EJl-00 l^j T''^"^ < ^ some £ > in 
the case 7 > 1/2, and if E{Zt) = for 7 < 1. (Under stronger conditions, similar results 
were already established by Davis and Resnick (1985) and Datta and McCormick (1998), 
among others.) 

In particular, the time series has the same extreme value index 7 as the innovations. 
Hence, if one has estimated the coefficients ipj and the time series model is invertible, 
then it suggest itself to estimate 7 by applying the Hill estimator (or some other estimator 
of the extreme value index) to the resulting residuals Zt, which are approximately i.i.d. 
Moreover, if p (or some estimate of it) is known, one may even calculate estimates of 
excecdancc probabilities Fx{x) over high thresholds x or estimators of extreme quantiles 
F^^(l — t) (for small t > 0) from estimators of the corresponding quantities of the d.f. 
Fz of the innovations, which in turn can be obtained from a tail analysis of the residuals. 
This program has been worked out for the first time by Resnick and Starica (1997) for 
the Hill estimator and autoregressive time series Ai?(m) of order m < 00: 

m 

Xt^Y^iXt-i + Zt, tez. 

1=1 

Let yJi, 1 < i < nij be estimators of the coefficients such that ri„((/3i — f i)i<i<in converges 
to some nondegenerate distribution; here dn —>■ 00 determines the rate of convergence of 
the estimators ipi. Define the residuals 

m 

Zt := - ^ <fiXt-i, m + 1 <t <n. 

i=l 
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Resnick and Starica (1997) proved that the Hill estimator based on the absolute residuals 



i=l 



I n—m—i+l:n—in 



n—m~k:n—m 



(with \Z\j:n-m denoting the jth smallest order statistic of m — 1 < t < n) is asymp- 
totically normal 

Vk{%^\z\ -7) — ^-^0,^2) 



weakly, provided that the d.f. F\z\ of the absolute innovations satisfies the second order 
condition 

MM^l/T _ 1 

lim im = £^ (2.4) 

t-*^ A{t) p ^ 

for some p < 0, and the number k of order statistics used for estimation tends to 00 
sufficiently slowly such that 

lim v^i(F|-f(l - k/n)) = (2.5) 

and 

VkF:-}(l - Vk/n) 
lim ^y- = 0. (2.6) 

(Ling and Peng (2004), who established a similar result for ARMA time series, showed 
that condition (12.61) is not needed if dn equals the best attainable rate.) 

In contrast, the Hill estimator applied to the absolute observations |Xj| directly is asymp- 
totically normal with asymptotic variance 



E»=ol^il 

with tpi denoting the coefficients of the MA (cxd) -represent at ion of the time series, i.e. Xt = 
Y^^oi^i^t-i, provided that the d.f. F\x\ of the absolute observations and the sequence of 
numbers of order statistics used for estimation satisfy the analogs to the conditions (12.41) 
and (12.51) . Note that the asymptotic variance (12.71) of the Hill estimator directly applied 
to the absolute values of the observations is strictly larger than the asymptotic variance 
7^ of the Hill estimator based on the absolute residuals. For example, if one considers an 
AR{1) time series with coefficient ipi G (—1, 1), then ipi = ip\, i E Nq, and the asymptotic 
variance (12^1) equals 7^(1 + - Therefore, Resnick and St arica (1997) 

claimed that "the procedure of applying the Hill estimator directly to an autoregressive 
process is less efficient than the procedure of first estimating autoregressive coefficients 
and then estimating a" (= I/7) "using estimated residuals". This conclusion, however, 
is justified in general only if both Hill estimators use the same number of order statistics. 
Since the optimal numbers of order statistics used by the Hill estimators are essentially 
determined by the functions A occurring in the second order condition (12.41) for F\z\ (in 
the case of the residual-based estimator) and A in the analog condition for the absolute 
time series (for the directly applied Hill estimator), it is a priori unclear which of the 
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estimators has the smaller variance when they both use an appropriate number of order 
statistics. Indeed, if the second order parameter^ is smaller in absolute value than the 
analogous parameter p from the second order condition for F\x\i then the best rate of 
convergence that can be achieved by the residual-based Hill estimator is of lower order 
than the optimal rate of the directly applied Hill estimator, i.e. the former estimator 
has asymptotic efficiency with respect to the latter estimator. Conversely, if \p\ > \p\ 
then the directly applied Hill estimator is asymptotically inefficient w.r.t. the model-based 
estimator, if both use the optimal number of order statistics. 

For general linear time series, it is not known how the second order behavior of F\x\ is 
related to the second order behavior of F\z\- However, for first order moving averages the 
relationship has been discussed by Geluk, Peng and de Vries (2000) and Geluk and Peng 
(2000), and the same technique can be used for finite order moving averages. For general 
linear dependence structures but a rather particular class of distributions of innovations, 
the relationship can be deduced from results by Barbe and McCormick (2004). More 
precisely, assume that 

Fz{x) = x'^/^c + dx-^ + o{x'^)), Fz{~x) = x'^'^^c + dx"^ + o[x~^)) 

as a; oo. Then, by Section 2.1 of Barbe and McCormick (2004), the tail of the linear 
time series f l2.1l) behaves asymptotically as 

Fx{x) = x-'^/^{d^ + D^x-'^ + o{x-^)) with 

oo 

j=-oo 
oo 

= {cd^y''^h{^^yo} + 5d\ij,\'/^+%^^^o}). 

j=-oo 

Hence, for this type of innovations, both functions A{t) and A{t) are multiples of t~^, 
but the constant factors differ from each other. Note that the above expansion of Fz is 
equivalent to F^^{1 — t) = c^t~^ + ■yd + o(l), i.e., up to terms of smaller order the tails 
behave like those of a shifted Pareto distribution. This shows that indeed the result by 
Barbe and McCormick describes the relationship between the second order behavior of 
the tails of Fz and Fx (or of F\z\ and -^|x|) only for a rather limited family of distributions 
of innovations. 

To sum up, in general, the result by Resnick and Starica (1997) does not allow to compare 
the asymptotic performance of the direct nonparametric approach and the model-based 
estimator if both use an appropriate number of order statistics. 

Having said that, it is nevertheless plausible to expect that the residual-based estimator 
has a smaller variance if the extra factor by which 7^ is multiplied in the variance formula 
(12. 7p is much larger than 1 (e.g. if the absolute coefficient of an AR{1) time series is close 
to 1). However, even in that case, the model based approach has serious drawbacks: 

• As mentioned before, usually one is not mainly interested in the extreme value index 
but e.g. in exceedance probabilities Fxix) (or extreme quantiles). To estimate such 
quantities, one uses the relationship (12. 3p which may introduce a non-negligible 
additional error term if for the given threshold x the ratio Fx{x)/Fz{x) is poorly 
approximated by the right hand side of (12. 3p . 
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• Of course, the model-based approach makes sense only if the model assumptions 
are (approximately) fulfilled. Although this remark is almost trivial, it is neverthe- 
less crucial to be aware of the fact that even moderate deviations from the linear 
relationship between Xi and Xt_i, . . . ,Xt-m, which can hardly be detected by sta- 
tistical tests, may completely wreck up the residual-based estimator, as we will see 
in the next subsection. 



2.3 Comparison of model-based and direct extreme quantile 
estimators: a simulation study 

Here we assume that Xt, t G Z, is a stationary AR{1) time series, i.e. Xt = (piXt-i + Zt 
with i.i.d. innovations Zt satisfying (12.21) . and that an extreme quantile -^^^(l — t) (t > 
small) is to be estimated. In this case relation (12. 3p reads as 

Ito ^ = I r ^ir!' , ^'"f-'K (2-8) 

■ - Fz(x) \ {l + M'-'(l-p)/p)/(l-\^i?'-'), Vi€{-1,0). ^ ' 



For simplicity (and in favor of the model-based approach) we assume that p is known to 
be equal to 1/2, so that (12. 8p simplifies to 

lim = 1/(1 _ 

x^oo !:< z\x) 

which, by the regular variation of Fz-, is equivalent to the following relationship between 
the corresponding quantile functions: 

Hence, in the model-based approach one may estimate -^^^(l — t) as follows: 

• Estimate (/?!, e.g., by the sample auto-correlation ipi at lag 1. 

• Approximate -^^"'^(l — t) by F^^{\ — (1 — |(^i|^/''')t) and estimate the latter by the 
Weissman estimator 

'n{l - |^i|i/^"^^)ty"'^ 



■'n—k-.n—l 



k 



where Zj-n-i is the jth smallest order statistic of the residuals Zt = Xt — 0iXt-i, 
2 <t < n, and 

i=l ■^n-k-l-.n-l 

is the corresponding Hill estimator. (The Weissman estimator can be motivated 
either by a Generalized Pareto approximation of Fz or the regular variation of Fz^ 
which implies F^^{l — u) ~ F^^{l — k/n){nu/k)~''^ for sufficiently small u and k/n.) 

In a small simulation study we consider time series of length n = 2000 with ipi = 0.8 and 
two different symmetric distributions of the innovations Zf. 
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a) a double-sided (unshifted) Pareto distribution, i.e. Fz{x) = Fz{—x) = 0.5x for 
x>l with 7 = 1/2; 

b) a double-sided shifted Pareto distribution, i.e. Fz{x) = Fz{—x) = Q.5{x + 1)"-'^/''' 
for a; > with 7 = 0.3. 

Clearly, Model a) is particularly favorable for the model-based approach, since the Hill 
estimator based on the innovations and (according to aforementioned results) also the Hill 
estimator based on the residuals is asymptotically unbiased for all intermediate sequences 
k — kn- In contrast, one might expect a significant bias of the model-based quantile 
estimator in Model b) if one uses too large a number of order statistics, as the Hill 
estimator is sensitive to a shift of the data. In the first Model, it is a priori not clear 
how large the bias of the direct nonparametric quantile estimator will be, since the second 
order behavior of Fx is not known. For Model b), however, the aforementioned result by 
Barbe and McCormick (2004) is applicable. A careful inspection of the proofs given by 
Resnick and Starica (1997) and Drees (2000) and lengthy, but straightforward calculations 
show that the ratio of minimal asymptotic root mean squared errors of the model-based 
Hill estimator and the directly applied Hill estimator equals 



which is approximately equal to 1.03 for ipi = 0.8 and 7 = 0.3. Hence, one may expect that 
the directly applied Hill estimator performs slightly better than the model-based estimator 
and this superiority may also carry over to the corresponding quantile estimators. 

We assume that the quantile (0.999) is to be estimate. In both models, this quantile 
is approximated by the average of the corresponding empirical quantiles of 200 simulated 
time series of length 9 • 10^ (such that (0.999) lies well within the simulated data sets), 
which yields F^^ (0.999) ^ 37.94 in Model a) and (0.999) ^ 7.312 in Model b) (with 
a relative approximation error of less than 0.002 with probability of at least 0.99). 

Figure 1 displays the simulated root mean squared error (RMSE) and the Li-crror of 
the direct quantile estimator (solid, resp. dotted line) and of the model-based estimator 
(dashed, resp. dash-dotted line) versus the number k of order statistics used for estimation. 
Obviously, the model-based approach outperforms the direct estimator in Model a), in 
that it has a much smaller RMSE and Li-error if k is chosen optimally. Moreover, its 
performance is less sensitive to an inappropriate choice of k: it performs reasonably well for 
all values of k between 150 and 750 (i.e., as expected one might use a very large proportion 
of all positive residuals), while the performance of the direct quantile estimator quickly 
deteriorates when k is smaller than 200 or larger than 300. Conversely, as expected, in 
Model b) the direct estimator performs somewhat better than the model-based estimator, 
i.e., its minimal RMSE and Li-error is smaller than the corresponding errors of the model- 
based estimator, and the performance of the direct estimator is less sensitive to the choice 
of k. However, both effects are much less pronounced than in Model a). This can also be 
seen from Table 1 which summarizes the minimal errors with the corresponding optimal 
values of k and also the simulated bias and the standard errors (i.e. simulated standard 
deviations) for the choice of k which leads to the minimal RMSE: while in Model a) the 
RMSE of the direct estimator is about 2.5 times larger than the RMSE of the model-based 
estimator, the ratio between the minimal RMSE's in Model b) is just about 1.2. 
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Figure 1: Simulated RMSE and Li-error of the quantile estimator directly applied to 
the data (solid resp. dotted line) and of the quantile estimator based on the analysis 
of residuals (dashed resp. dash-dotted line) vs. number k of order statistics for a linear 
AR{1) time series; left plot: innovations according to Model a), right plot: Model b) 





Model a) 


Model b) 




RMSE Li-error 


RMSE Li-error 




bias / s.e. 


bias / s.e. 


direct estimation 


15.4 (k=249) 9.7 (k=249) 


2.7 (k=73) 1.8 (k=70) 




3.3 / 15.0 


0.7 / 2.6 


model based estimation 


6.1 (k=662) 4.5 (k=765) 


3.2 (k=25) 2.2 (k=23) 




-0.6 / 6.0 


1.1 / 3.0 



Table 1: Minimal errors, bias and standard errors of the quantile estimators in the (un- 
perturbed linear) AR{1) time series models 



Prom these results, one gets the impression that, although the model-based approach 
does not yield more accurate estimators for all distributions of innovations, its overall 
performance is at least as good as the performance of the direct nonparametric approach. 
However, as we will see next, this interpretation is premature (and indeed quite dangerous) 
as the model-based estimator can be very sensitive to relatively small deviations to the 
model. 

As an example, we consider a nonlinear AR{1) time series, namely a stationary solution 
to the equation 

Xt = ^iXt-i + 6sgn{Xt-i) log (max(|Xt_i|, 1)) + Zt with ^ = 0.8, 6 = 0.6. (2.9) 

Here the linear relationship between Xt and Xt-i is perturbed by an extra logarithmic 
term. Of course, one cannot expect that the relationship (12.81) holds in this nonlinear 
model. Hence, most likely, the model-based estimator will show an increased bias. 

From Figure 2, which shows a typical scatterplot of {Xt-i,Xt) for a time series according 
to model (12. 9p with shifted double-sided Pareto innovations from Model b), it is apparent 
that, with the naked eye, such a time series can hardly be distinguished from a classical 
linear AR{1) time series (with an increased autoregressive coefficient). Moreover, if one 
fits a linear AR{1) model to such a time series of length n = 2000, then the turning point 
test and the difference-sign test with nominal size 0.05 (see Brockwell and Davis (2002), 
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Figure 2: Scatterplot Xj) of a simulated nonlinear AR{1) time series fl2.9p of length 

n = 2000 with innovations according to Model b); the dashed line indicates the fitted 
linear relationship with estimated autoregressive coefficient 0.982 

p. 36 f.) detect dependence in the residuals with probability less than 0.06, i.e. these 
tests are not capable of detecting the model deviation. Also the Portmanteau test with 
nominal size 0.05 applied to the residuals has a maximal power of about 0.13, i.e. the 
rejection rate is less than 0.1 higher than the false alarm rate if the data comes from the 
corresponding AR{1) model. (Note that the Portmanteau test should not be applied if 
the variance of the innovations is infinite; hence, strictly speaking, it is not suitable for 
Model a).) To sum up, it is almost impossible to distinguish a time series from model 
(12. 9p from a linear AR{1) time series. 

Figure 3 shows the RMSE and Li-errors of the quantile estimator obtained from the direct 
approach and the model-based approach when the time series are simulated according to 
(12. 9p . but erroneously a linear AR{1) model is assumed. Table 2 gives the minimal 
errors in this case (analogously to Table 1). For both d.f.s of the innovations, the errors 
of the model-based quantile estimators are much larger in the nonlinear AR{1) model 
than for the classical linear AR{1) time series. To a large extent, the deterioration of 
the performance is caused by the large bias, but also the variance is much larger now 
even if the decrease of the optimal number of order statistics is taken into account. In 
sharp contrast, the direct quantile estimator, which does not rely on a specific time series 
model, is more precise for these nonlinear AR{1) time series than for the linear ones. 
Consequently, if the innovations are drawn from Model a), then the minimal RMSE of 
the model-based quantile estimator is about 25% larger than the minimal RMSE of the 
direct quantile estimator, while for innovations according to Model b) the RMSE of the 
model-based estimator is more than 7 times larger than the RMSE of the nonparametric 
estimator! 

Figure 4 demonstrates that the very poor performance of the quantile estimator which 
is based on the residual analysis is not due a few particularly wrong estimates but that 
indeed the estimator yields rather poor results with a high probability. In this plot, for 
innovations according to Model b), kernel estimates of the density of the direct (solid line) 
and the model-based quantile estimators (dashed line) are displayed for optimal values of 
k (i.e., k = 99 and k = 22, respectively). While the mode of both densities is close to the 
true value (indicated by the vertical dotted line), the density of the model-based estimator 
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Figure 3: Simulated RMSE and Li-error of the quantile estimator directly applied to 
the data (solid resp. dotted line) and of the quantile estimator based on the analysis of 
residuals (dashed resp. dash-dotted line) vs. number k of order statistics for the nonlinear 
AR{1) time series (12.91) : left plot: innovations according to Model a), right plot: Model 
b) 





Model a) 


Model b) 




RMSE Li-error 


RMSE Li-error 




bias / s.e. 


bias / s.e. 


direct estimation 


11.9 (k=247) 8.4 (k=260) 


2.1 (k=99) 1.5 (k=122) 




-0.0 / 11.9 


0.0 / 2.1 


model based estimation 


14.9 (k=462) 11.4 (k=498) 


15.4 (k=22) 9.7 (k=16) 




9.2 / 11.7 


9.5 / 12.1 



Table 2: Minimal errors, bias and standard errors of the quantile estimators in the non- 
linear AR[1) time series (12. 9p 
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Figure 4: Estimated density of the direct quantile estimator (solid line) and the model- 
based estimator (dashed line) for time series according to (12. 9p with innovations according 
to Model b); the true quantile is indicated by the vertical dotted line 

is strongly skewed to the right and has a large spread. In contrast, the distribution of 
the nonparametric estimator is more symmetric and much more concentrated around the 
true value. 

To sum up, the example shows that the model-based approach to the estimation of extreme 
quantiles can give completely misleading estimates even if the deviation from the assumed 
linear time series model is moderate in the sense that it is very difficult to detect by means 
of statistical tests. Therefore, it seems advisable to use estimators only with utmost care 
which are based on an extreme value analysis of residuals obtained under parametric 
model assumptions about the dependence structure. In particular, it is not justified to 
consider them generally superior to the directly applied extreme value estimators. 




3 Analysis of the extremal dependence structure: Is 
there a world behind the extremal index? 

So far we have only considered estimators of the marginal tail behavior. For many appli- 
cations, also the dependence between consecutive extreme values of the time series is of 
interest. For example, if Xt denotes the negative return (loss) of some financial invest- 
ment, it is important to assess the risk that all (or some of) the losses Xt, Xj+i, . . . , Xt^m-i 
in m consecutive periods (or perhaps the total loss Yl^^ ^t+i) are large. 

In the analysis of the extreme value behavior of maxima M„ := maxi<i<„ Xf of n consec- 
utive observations the so-called extremal index plays a crucial role. Let Xt, 1 < t < n, 
denote an associated sequence of i.i.d. random variables with d.f. Fx- Assume that, for 
some normalizing constants a„ > and 6„ G M, 

maxi<t<^ Xt - bn ^ ^ 

On 

weakly for some nondegenerate d.f. G. Leadbetter (1983) proved that then 
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for some 9 e [0, 1] provided Leadbetter's condition D{un) 
Vm, n > 3 a„^rn ^ k,l > 0, 1 < ii < 12 < . . . < ik < ik + m < ji < j2 < . . . < ji < n : 

P\ max Xi^ < Un, max Xi^ < Un \ - 

L l<r<fc l<s<l J 

-P\ max Xi^ <Un \ ■ P\ max Xj^ < m„ \ 

I l<r<A: J I l<s<l J 



and 3 m„ = o(n) : lim Q!„,^„ = 

n^oo 



holds for all Un — + bn with x > G~^{0), and the d.f. P{{Mn — bn)/an < x} of the 
standardized maximum converges for some x > ^"^(O). Hence, if the extremal index 9 is 
strictly positive, the maximum converges to some nondcgcncratc limit distribution that 
is of the same type as the limit distribution in the case of independence. 

Moreover, Hsing, Hiisler and Leadbcttcr (1988) proved that under weak additional as- 
sumptions (including the slightly stronger mixing condition A{un)) the point process 
Yl^=i^t/n^{a„x+bn,oo){^t) oi Standardized time points at which exceedances occur con- 
verges to a compound Poisson process. Then, typically, the extremal index 9 equals the 
reciprocal value of its mean cluster size (although in general one only knows that ^ is a 
lower bound for this value). 

Since the asymptotic behavior of maxima of consecutive observations is completely deter- 
mined by the extremal index and the tail behavior of Fx, the literature on the statistical 
analysis of the extremal dependence structure focusses on the estimation of 9 and, to a 
lesser extent, the estimation of the cluster size distribution; see Hsing (1991, 1993), Smith 
and Weissman (1994), Weissman and Novak (1998), Ancona-Navarrete and Tawn (2000) 
and Ferro and Segers (2003), among others. 

However, as the aforementioned example shows, in financial applications often other statis- 
tics of extreme values (than maxima) are of main interest, and the same holds true in 
other fields of applications where exceedances over high thresholds rather than block 
maxima are considered. We will demonstrate by a particular time series model that the 
extremal index and the cluster size distributions are often not sufficient to determine the 
distribution of statistics which arise in a natural way. 

In the remainder of this section, we consider stationary solutions of stochastic recurrence 
equations of the type 

Xt = AtXt-i + Bt, te Z, (3.10) 

with {At, Bf) denoting i.i.d. random vectors with values in (0, 00)^. For instance, a squared 
ARCH{1) time series satisfies this relationship; further applications of this model were 
described by Vervaat (1979). Kesten (1973) proved that such a stationary solution exists if 
Ai does not have a lattice distribution, the distribution of i?i/(l — Ai) is not degenerate, 
and if there exists k > such that EA'^ = 1, E{A1 max{logAi,0)) < 00 and EB'^ e 
(0,00). Moreover, then Fx{x) ~ cx~'^ for some c > so that the standardized maxima 
of an accompanying i.i.d. sequence converge to a Frechet distribution with extreme value 
index 7 = 

De Haan et al. (1989) calculated the extremal index and the cluster size distribution 
of such a time series. Let Wj — Y[i=i^i (with the convention Wo = 1). Note that 
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log Wj,j > 1, is a random walk with negative drift. Hence the sequence Wj,j > 1, tends 
to and it is almost surely bounded. Let Uk denote the kth largest value of this sequence. 
Then the extremal index 6 and the probability tc^ that a cluster of the limiting compound 
Poisson point process has size k are given by 



e = P{Ui=maxWj <t}dt = 1 - Emm{Ui,r 



TTfc = ^^^^ with (3.11) 



= P{j2^it,oo){W,) = k-l^dt = E{mm{Uk-i,l)-mm{Uk,l)). 

Hence the extremal index is determined by the distribution of the maximum of the geo- 
metric random walk Wj,j > 1, and the probability that a cluster of exceedances has size 
k is determined by the distributions of the k largest "order statistics" of this sequence. 

More recently, Gomes et al. (2004) (see also Gomes et al. (2006)) analyzed the joint 
asymptotics of k consecutive observations of the stationary solution of the recurrence 
equation (13.101) . More precisely, they proved that there exists a sequence a„ > such 
that 

lim nP{Xj > ttnXi for all 1 < ? < A;| = E min (x~'^Wi), 

n^oo ■' ■' -J - ) 0<j<k-V ^ ■'^ 

lim nP\Xi > a„Xi for some 1 < j < k\ = E max (xj'^Wi), 

n^oo ^ ■' -J - J 0<j<k-V ^ ■'^ 

for all Xj > 0. Obviously, the limits on the right hand sides cannot be expressed in terms 
of the extremal index 9 and the cluster size distribution vr^, > 1, for allxj > 0. However, 
this will not even be possible in the special case that all Xj are equal to some x, say, so 
that 

lim nP\Xi > ttriX for all 1 < j < k\ = x~'^E min 

n->oo ^ ^ -J - } 0<i<fc-l ^ 



X 



[ P{ min Wi > t] dt, 

Jo ^0<j<k-l ^ J 



lim nP\X^ > a„x for some 1 < j < k\ = x '^E max 



X 



poo 

/ P| max Wi > t\ dt, 

Jo ^o<i<fc-i ^ J 



because the right hand sides depend on the distribution of the minimum and the maximum 
of a finite segment of the sequence Wj,j > 1, instead of the distribution of the order 
statistics of the whole sequence. 

The asymptotic variance of the Hill estimator discussed in Section 2 is another example 
of a parameter which arises naturally in statistical applications and cannot be determined 
from the extremal index and the cluster size distribution. Drees (2000) showed that the 
Hill estimator based on the k largest observations of a stationary solution of the recurrence 
equation (13.101) is asymptotically normal with variance 



oo „i 

2 V / P{Wj > t} dt 

7 = 1 ^0 
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provided that k tends to oo not too fast. (An analogous result holds true for the maximum 
likelihood estimator of the extreme value index, which is asymptotically normal with a 
similar variance with factor (1 + instead of Here, the parameter is determined 
by all marginal distributions of the geometric random walk Wj, j > 1. 

These examples demonstrate that in many applications the extremal index (and often 
also the cluster size distribution) does not give the information one is actually interested 
in. Thus there is clearly a need to put the statistical analysis of the extremal dependence 
on a broader basis such that also parameters can be treated which describe aspects of the 
dependence structure different from the size of clusters of exceedances. 

An interesting point of departure may be the concept of cluster functionals introduced by 
Yun (2000) and developed further by Segers (2003). Roughly speaking, these are function- 
als which depend only on all shortest vectors of observations containing exceedances over 
a given high threshold. An asymptotic theory on estimators of functionals of that type 
would be a significant step forward towards a general approach to analyze the extremal 
dependence structure of stationary time series. 

4 Conclusions 

In Section 2 we compared the direct approach to the marginal tail analysis, advocated by 
Rootzen, Leadbetter and de Haan (1990), among others, with a model-based approach 
where the tail of the residuals is analyzed. It was pointed out that the perception that the 
latter approach is more efficient when the model assumptions are correct is not generally 
justified. Moreover, it was shown that the model-based estimators can be extremely sen- 
sitive to moderate deviations from the models, which are difficult to detect by statistical 
means. Hence, in most applications, the direct nonparametric analysis, that has proved 
powerful in the classical i.i.d. setting in several papers by Laurens de Haan and many 
others, seems also preferable for the tail analysis of serially dependent data. 

It is worth mentioning that usually the model-based approach is even more problematic if 
a nonlinear time series model is assumed. For example, as we have seen in Section 3, the 
marginal tail behavior of a stationary solution to the stochastic recurrence equation (13.101) 
does not only depend on the tail of the "innovations" At (and Bt), but on their whole 
distribution, since the extreme value index 7 = 1/k is determined by the relationship 
EA1 = 1. So, in a parametric submodel, it will not be sufficient to analyze the tail 
behavior of suitably defined residuals, but one has to estimate this expectation, that 
depends on the center of the distribution of the innovations and, in addition, is sensitive 
to deviations in the tail. Hence, to obtain a reliable estimate, usually one has to combine 
some nonparametric estimate for the central region with an extreme value estimator for 
the tail of the distribution of the innovations, which makes the whole method cumbersome. 

While in the marginal tail analysis sometimes too restrictive model assumptions are used, 
the inference on the extremal dependence structure is often too focussed on the extremal 
index (which is then estimated in quite general time series models). The nonlinear time 
series (13.101) is a nice example in which parameters arise in a natural way which cannot be 
expressed in terms of the extremal index or the cluster size distribution. This observation 
calls for more general estimators of the extremal dependence structure. Unfortunately, 
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even from the most optimistic point of view, such a general theory has just started to 
emerge and many challenging problems still wait for a solution. 

Acknowledgement: The author is grateful to Jiirg Hiisler and Peng Liang for pointing 
out the reference Barbe and McCormick (2004). 

References 

Ancona-Navarrete, M.A., and Tawn, J. A. (2000). A comparison of methods for estimating 
the extremal index. Extremes 3, 5-38. 

Barbe, Ph., McCormick, W.P. (2004). Tail calculus with remainder, apphcations to tail 
expansions for infinite order moving averages, randomly stopped sums, and related topics. 
Extremes 7, 337-365. 

Brockwell, P.J., and Davis, R.A. (2002). Introduction to Time Series and Forecasting 
(2nd ed). Springer. 

Datta, S., and McCormick, W.P. (1998). Inference for the tail parameters of a linear 
process with heavy tail innovations. Ann. Inst. Statist. Math. 50, 337-359. 

Davis, R.A., and Resnick, S.I. (1985). Limit theory for moving averages of random vari- 
ables with regularly varying tail probabilities. Ann. Probab. 13, 179-195. 

Dekkers, A.L.M., and de Haan, L. (1989). On the estimation of the extreme-value index 
and large quantile estimation. Ann. Statist. 17, 1795-1832. 

Drees, H. (2000). Weighted approximations of tail processes for /3-mixing random vari- 
ables. Ann. Appl. Probab. 10, 1274-1301. 

Drees, H. (2002). Tail empirical processes under mixing conditions. In: H.G. Dehhng, 
T. Mikosch und M. S0rensen (eds.). Empirical Process Techniques for Dependent Data, 
325-342, Birkhauser, Boston. 

Drees, H. (2003). Extreme quantile estimation for dependent data with applications to 
finance. Bernoulli 9, 617-657. 

Ferro, C.A.T., and Sogers, J. (2003). Inference for clusters of extreme values. J. Roy. 
Statist. Sac. B, 65, 545-556. 

Geluk, J.L., Peng, L., and de Vries, C.G. (2000). Convolutions of heavy-tailed random 
variables and applications to portfolio diversification and MA(1) time series. Adv. Appl. 
Probab. 32 , 1011-1026. 

Geluk, J. L., and Peng, L. (2000). An adaptive optimal estimate of the tail index for 
MA(1) time series. Statist. Probab. Lett. 46, 217-227. 

Gomes, M.I., de Haan, L., and Pestana, D. (2004). Joint exceedances of the ARCH 
process. J. Appl. Probab. 41, 919-926. 

Gomes, M.I., de Haan, L., and Pestana, D. (2006). Correction to: Joint exceedances of 
the ARCH process. J. Appl. Probab. 43, 1206. 

de Haan, L. (1990). Fighting the arch-enemy with mathematics. Statist. Neerlandica 44, 
45-68. 



17 



de Haan, L., Resnick, S.I., Rootzen, H., and de Vries, C. (1989). Extremal behaviour of 
solutions to a stochastic difference equation with applications to ARCH-processes. Stock. 
Proc. Appl. 32, 213-224. 

de Haan, L., and de Ronde, J. (1998). Sea and wind: multivariate extremes at work. 
Extremes 1, 7-45. 

dc Haan, L., and Stadtmiillcr, U. (1996). Generalized regular variation of second order. 
J. Aust. Math. Soc. A 61, 381-395. 

Hsing, T. (1991). On tail estimation using dependent data. Ann. Statist. 19, 1547-1569. 

Hsing, T. (1993). Extremal index estimation for a weakly dependent stationary sequence. 
Ann. Statist. 21, 2043-2071. 

Hsing, T., Hiisler, J., and Lcadbctter, M.R. (1988). On the exceedance point process for 
a stationary sequence. Probab. Theory Relat. Fields 78, 97-112. 

Kesten, H. (1973). Random difference equations and renewal theory for products of 
random matrices. Acta Math. 131, 207-248. 

Leadbetter, M.R. (1983). Extremes and local dependence in stationary sequences. Probab. 
Theory Relat. Fields 65, 291-306. 

Leadbetter, M.R. (1995). On high level exceedance modeling and tail inference. J. Statist. 
Plann. Inference 45, 247-260. 

Leadbetter, M.R., and Rootzen, H. (1993). On central limit theory for families of strongly 
mixing additive random functions. In: Stochastic processes: a festschrift in honour of 
Gopinath Kallianpur (S. Cambanis et al., eds.), 211-223. Springer. 

Ledford, A.W., and Tawn, J. A. (2003). Diagnostics for dependence within time series 
extremes. J. Royal Statist. Soc. B 65, 521-543. 

Ling, S., and Peng, L. (2004). Hill's estimator for the tail index of an ARMA model. J. 
Statist. Plann. Inference 123, 279-293. 

Mikosch, T., and Samorodnitsky, G. (2000). The supremum of a negative drift random 
walk with dependent heavy-tailed steps. Ann. Appl. Probab. 10, 1025-1064. 

Novak, S.Y. (2002). Inference on heavy tails from dependent data. Siberian Adv. Math. 
12, 73-96. 

Pereira, T.T. (1994). Second order behaviour of domains of attraction and the bias of 
generalized Pickands' estimator. In: Extreme Value Theory and Applications III (J. 
Galambos, J. Lechner and E. Simiu, eds.), 165-177. NIST special pubhcation 866. 

Resnick, S., and Starica, C. (1997). Asymptotic behavior of Hill's estimator for autore- 
gressive data. Comm. Statist. Stochastic Models 13, 703-721. 

Rootzen, H. (1995). The tail empirical process for stationary sequences. Unpublished 
manuscript, Ghalmers University Gothenburg. 

Rootzen, H. (2006). Weak convergence of the tail empirical process for stationary se- 
quences. Submitted. 

Rootzen, H., Leadbetter, M.R., and de Haan, L. (1990). Tail and quantile estimators 
for strongly mixing stationary processes. Report, Department of Statistics, University of 



18 



North Carolina. 

Rootzen, H., Leadbcttcr, M.R., and de Haan, L. (1998). On the distribution of tail array 
sums for strongly mixing stationary sequences. Ann. Appl. Probab. 8, 868-885. 

Segers, J. (2003). Functionals of clusters of extremes. Adv. Appl. Probab. 35, 1028-1045. 

Smith, R.L., and Weissman, I. (1994). Estimating the extremal index. J. Roy. Statist. 
Soc. B 56, 515-528. 

Vervaat, W. (1979). On a stochastic difference equation and a representation of non- 
negative infinitely divisible random variables. Adv. Appl. Probab. 11, 750-783. 

Weissman, I., and Novak, S.Yu. (1998). On blocks and runs estimators of the extremal 
index. J. Statist. Plann. Inference 66, 281-288. 

Yun, S. (2000). The distribution of cluster functionals of extreme events in a dth-order 
Markov chain. J. Appl. Probab. 37, 29-44. 



19 



